so we are super excited to have mark res
belt from CWI come give a talk for us
today mark did his PhD CWI and now is
halfway in between being a postdoc and
halfway in between being a PG student
because you haven't defended yet yeah
but you are officially post-op are they
pay and you like a postdoc sorry I paid
you here paying me almost like postdoc
it's a see you are in between mmm in
between I'm a pre doc it's kind of nice
you know you're a pretty postdoc a pre
post up pretty postpone dog post wound
up okay so I duck TB is probably in my
opinion the most exciting academic
database system coming out in the last
two years and as the title says its
equal life for analytics so I don't
equal life for analytics so I don't
think there's anything else to say
ie marks all summed up TV's awesome so
go for it and it's just a way will do
this is people can just unmute
themselves and ask mark as we as we go
along with questions yeah no problem
yeah so at any point just don't
literally shouts but it is a bit late
here so I guess my neighbors wouldn't
like that but just ask that's fine all
right so I guess I don't need to
introduce myself I could have skipped
this slide thank you for the
introduction and the invites and I'm
very happy to the year as well so yeah
me and hummus in this picture you can
see honest and me down there drinking
beer at almost Oktoberfest we made dr.
beam the database system and I'm here to
present that's here today I just
interrupt go back to the answer for
everyone go back to last big bus line
either middle that's Peter bonds that's
the inventor of vector wise yeah another
sort of like see you i Christ out an
amazing systems
no yes yes yeah so Peter is not super
involved in actually building dot to be
in terms of like the code base
but we do definitely ask him for advice
quite a lot but he is way too busy with
tons of other things like he is super
busy he has no real time to code that
much anymore so when we definitely talk
to him a lot I mean we used to share an
office before all this mist all right
so basically I have kind of two parts of
this presentation one the first part is
kind of a talk I gave already at cider
because well it's too late to prepare
anything new and then I have also a
bonus part lessons learned building a
database system which i think is
interesting or could be interesting to
people that might be building a system
right now like Andy all right so before
I talk about database systems I first
want to give a bit of a motivation for
why we build duck TB in the first place
and that motivation it kind of started
with data science well what is data
science I just called statistics on
MacBook as a thousand different words
for it it's very overloaded term but I
think one thing that's kind of obvious
about data scientists is they work with
data it's in the name right and then we
have us the database researchers in our
fields we've been working for more than
50 years I think at perfecting database
systems so obviously they have
scientists that have made systems they
have to be using them right well
unfortunately not they use them if they
are forced to such as if they have to
take data out of the database system in
the first place but they generally kind
of prefer to avoid them which is a bit
sad right it makes the database set so
here's a quote from Hadley Wickham I
can't actually read it because I blocked
can't actually read it because I blocked
my quote with this overlay if your data
fits in memory there is no advantage to
putting it in the database it will only
be slower and more frustrated so if you
don't know Hadley Wickham he's kind of
like diarga he made he made kind of all
the popular our pack just like ggplot
deep liar - huge guy in data science and
he is saying this about our feet
about our like our lives work in a lot
of cases so it's a bit sad
so why saying that well maybe they just
don't need database technology right so
what do data science actually do well I
think probably the easiest way to figure
that out is just look at the packages
they use right so if you look at like
popular data science frames in Python
you see the top two or numpy and pandas
which is pure data wrangling right in
our it's very similar the top two deep
lie our table and down there data table
lie our table and down there data table
it's all data wrangling libraries and
what do they all do well they do
database operations they do sometimes a
bit more but some of them do just
database operations or even less than
that they do joins aggregates basic
transformations filters so it's kind of
turns out that data scientists they do
need all this functionality that we are
that we are building in our database
systems but somehow they're not using
them so let's go back to this quote why
does he think that database systems are
slow and frustrating right I mean we've
been working on optimizing database
systems for what 50 years or more not
not me personally but our field so to
figure it out probably the east ways to
just run a data science workflow and
with and without a database system and
then just see what happens so typical
database data science workflow it's kind
of simple you load some data from the
CSV file you do some ETL so
CSV file you do some ETL so
pre-processing they run your analysis
like a really simple sort of workflow so
for our example that we ran here we have
some voter data from North Carolina
voters it's just a data set we have
lying rounds which is around 300
megabytes in CSV format then we do some
in this case very nonsensical
pre-processing where we remove entries
oh no this is not nonsensical we remove
entries with a missing phone number or
house number and then we compute the
correlation between phone number and
house that is the nonsense part I
couldn't read it again because this
thing is blocking my view
so of course correlation between phone
Murnau's number expected
zero so let's do this in our first we
install some required packages let's
install like data table and apply some
basic wrangling packages then we run our
analysis first read the CSV file then we
apply the filter and then we'll check
the correlation it's three steps three
lines of code and indeed receive no
correlation between phone number and
house them surprised it's quite quite
easy and convenient so what do we what
do what happens if you do this with the
database err well for that let's just
use Postgres because it's kind of the
open source database well first we have
to install the database server and this
is not actually trivial without a
separate package management and for for
us at least on our cluster we don't have
root access so we have to download the
root access so we have to download the
sources in compiler then after we do
that we have to set up some cryptic
environment variables initializes
database and use this very descriptive
PG cuckoo commands because in the when
this was created they had to save lights
on the commands and initializer database
then we install the clients at least
this is easy right we just install it
through the our package manager except
it doesn't work because the client
actually requires you to have the
actually requires you to have the
database system itself on your path for
some reason ok fine we put the server on
the path to install the client install
it again and then we run now we connect
to our database system providing like
the host the database name the clients
whatever we create a table so I have it
I'm actually being nicer because you see
this dot dot dots this is around a
hundred different calls because it's a
very wide data set it's very very very
big statement actually then we send a
query where we use like string
concatenation to construct a copy
commands to read the CSV file then we
can actually use a deletes read the
table and use a field so you can already
see like all of these steps they're
pretty annoying to do you're like if the
installation is more difficult usage is
more difficult there's a lot of like
pains here
what about the time well it's a lot
slower as well and the the whole reason
slower is actually because of this
because of this client-server connection
where you are basically pushing like
this whole table which is around 300
megabytes over a socket connection and
that just costs you like 10 extra
seconds here so it's annoying and slow
like in the end Hadley is right for
these types of workflows are in Python
database systems they are frustrating
they are slow so what are the problems
here and how can we fix them well the
problems that we have kind of identified
is that well they're difficult install
is that well they're difficult install
difficult to set up and maintain they
have a really slow data transfer to and
from the clients and I kind of have a
poor interface with the client
application so of course you have its
client-server architecture it's nice for
multiple users but for a single user to
start pointless it just makes everything
more cumbersome the installation needs
to be easy if you use Python you need to
be able to install it all through pip
install anything else is already way too
cumbersome because then you're already
looking into like OS specific package
managers like figure it filling with
compilers and that's just not happening
it's too much transfer efficiency is
very very crucial which also we have a
paper about in fuel to be some years ago
actually when the picture you saw before
of the Oktoberfest was taken was this
feel to be so the transfer efficiency
like the database flying particles
they're not optimized for this kind of
bulk transfer scenario and they got into
transfers necessary because the data
scientists they can't run everything
within the server right they need to use
their like machine learning framework of
the week of the month whatever to run
their data science for plotter and I'm
gonna do like a deep neural network
within sequel so they needs transfer
this data at some point and finally its
sequel it's a lovely language
I very much like sequel but as an
interface for clients it's kind of
interface for clients it's kind of
problematic because you have to embed
sequel as a string and it's kind of a
natural right you don't get syntax
highlighting in most editors and it
actually has introduced the whole class
of security format this interface which
we call sequel injection it's also not
that easily composable so if you look at
deep liar for example you can change
operations so you have like a filter and
a projection you can chain them and you
can look at intermediates by just
pushing like this head function in
between at any point and simple you can
kind of do that with temporary tables
but you have to write a lot of cruft you
see like there's an extra select start
from being pushed everywhere there's a
lot of like Crufts around it they just
don't need in this deep liar like
interface so well the cure database
systems they're not a good fit for data
science why do we care well the answer
is simple
money data science is very popular so we
money data science is very popular so we
can make money we can get grants and
then we have but maybe a more like
fundamental reason is that data
scientists they are kind of reinventing
database systems right there they are
re-implementing joins these areas they
are kind of recreating a database system
and are doing a very poor job of it
they're basically ignoring all the
research that we as like in the database
field at worked on in the past 50 years
they don't use optimizers they
materialize huge intermediates they
don't support memory computation they
don't support memory computation they
don't have any parallelism they don't
scale up so they are kind of making it
like a poor-man's database that is just
like winning or like that then it gets a
lot of users just because they have a
good interface like a good a good
connection to the arm Python communities
and database they can solve all these
problems Oh muck yeah wonder you can may
ask the question if you don't mind of
course yeah yeah oh by the way my name
is Ling I'm a PhD student here nice yes
is Ling I'm a PhD student here nice yes
so I was just like the previous previous
few slides and also the early
introduction of make me thinking that um
especially at the beginning of the talk
when you are talking about the most
popular libraries in Python which are
now I and pandas right just for my own
perspective I'm using those libraries
yeah I actually feel like the the
functionality and a feature of of at
least for numpy and pandas are a little
bit different from my perspective I feel
like in pandas you have lots of like
this kind of like database gridded
operations like a filters drawings
transformations etc but in numpy many of
time you are using those like rhythmatic
or numeric created operations like the
two matrix modulation that's that's a
different different side do broadcasting
or you have this very convenient
interface indeed ability to interface
with very community interface to quickly
index a matrix at different rows columns
etc so I think the atom part would be a
very important reason for numpy you
definitely have the matrix stuff as well
that is an excellent point indeed that
is different but still a lot of the
stuff in Vampires is also present in
database systems and you can like that's
also kind of the reason why you need
this convenient get a bridge right
between the two because once you start
wanting to do things like matrix
multiplication for which the relational
multiplication for which the relational
models may be not so suitable they can
easily like go back and forth between
what each system has like what each
system is good at the strength of each
system right okay okay I see okay that
makes it thank you for your question all
right so where was I here I think yeah
so this was actually kind of on the
bridge between two things
indeed the database are slow and
frustrating but they don't need to be
cube by database dr. B our database
which we call this sequel ID for
analytics the the reason we call seek
light for analytics is because sequel
light is really like the most easy to
use database if you have used that and
we really really like it but it has one
big flaw and that is that it's quite
slow once you have to do any sort of
analytics so the core features of duck
PB are actually very sim similar to
sequel lights with a few big exceptions
there's very simple installation I would
go a bit more in
Depp there it's embedded so you don't
have any external server management we
have a single file storage format just
like synchronize we have fast analytical
processing and we have very fast
transfer to an R Python and the database
system together with a convenient API
for this so the first question you
always get is white duck TB well Ducks
are amazing animals they fly they walk
they swim their resilience they can live
they swim their resilience they can live
off anything and Honus used to own a pet
duck called Wilbur here's a picture of
Hannes my former supervisor now a
co-worker still kind of supervisor with
his duck on the boats where he lives
because he's a bit crazy but the good
cat so we see he used to own a duck
where's the dock now look the duck found
greener paste paste pastures helium flew
a wall to form a little duck family we
reassume weights it was like it was a
pet duck in his and his boathouse and
then it just won me like I'm out of
hearing from away well the duck has
complete freedom so would just be on
like the you wouldn't be in a cage you
know you would just flip swim in the
canals cuz not the canals Bam's for him
it would just swim around fly around fly
back and sometimes he would already be
gone for like one day two days but
usually come back at some point he just
didn't come back so he assumed be met a
lady duck and made a beautiful family
lady duck and made a beautiful family
did this happen do the duck disappear
before you guys started up TBH yeah yeah
that's a great backstory for a name
thank you it was a very sweet duck I
watched it grow up so I'm I'm still a
missing Wilbur alright so well dr. B
it's a very young project it was the
first commits I made in the one year and
nine months ago the 13th of July 2018 as
you can see there's a very bare-bones
commits I just added a parser and the
initial system was actually heavily
inspired by peloton so when I started
making duck TV I just looked at the
parser and the binder from peloton and
just took like
half of it so here you can see a code
snippet from the early duck TV and if
you are still familiar with the peloton
code base maybe it looks a lot like it
because it is actually that many names
are still present so we have a data
table bind context string util but by
now it's mostly been rewritten besides
some small things the string util
methods are still there and they're
still the same exception types so here's
the internals at a kind of a glance like
it's a columns for database surprise
it's analytics I'd use vectorized
processing that is actually probably a
surprise to some people that's prefer
jits there's a lot of reasons why we
chose for vectorized we have had a lot
of debates about this
the main reason though that we decide on
Ischl E was because we wanted to avoid
the compiler dependency which is very
problematic if he wants the user to
install the system but there are a lot
of good sites to vectorize processing
still when you compare the Ridgid they
both have their advantages we use an art
index we use MVCC it's inspired by the
index we use MVCC it's inspired by the
one by hyper but has been adapt its kind
of to be optimized for a vectorized
engine and we use single files for it
which looks a bit like this it's really
C++ and we use the post pressed parts
which was heavily inspired by the
peloton Raptor out that parcel can you
talk about the changes you made to the
hybrid bzz like to deal with a
vectorized engine well it wasn't so much
deal with a vectorized engine as it was
to deal with bulk updates and Colin or
updates so I'm actually still looking to
write a paper about this but I'm kind of
procrastinating yeah the thing is the
hyper MVCC it has one big pitfall which
is that it's always copies entire rows
to the undo box I don't know how
familiar you are with this we've done
the same thing you implement okay so it
has one big pitfall which is that it
does this like copying of entire rows
and do buffers which for inserts and
deletions is kind of fine but for
updates gets pretty mess
see especially when you have to deal
see especially when you have to deal
with like large columnar updates because
if you touch say like if you run updated
copy the updates like one column but
like ten percent of the column there
we're copying ten percent of your table
into an undo buffer and it might be one
call out of one hundred calls so we
think mad we fix that right we don't do
that either is that correct yeah yeah
and then the other the other missed the
other issue with hyper MVCC is the
other issue with hyper MVCC is the
single writer and i think we got it we
gotta run that to the single writer yeah
there's like some sing like they only
have one writer at a time or so some
some aspect of it like we don't really
have this with a at least we have one we
have parallel writes
to different tables and also to the same
table kind of with some like switch
around a lot on some latches here and
there yeah but one thing that we have is
one committer maybe that's when we look
lighter to the right a headlock one
committer this is what yeah we don't we
don't either
I'm not sure this but we are also like
knots really that's concerned with like
very small like point updates or pointer
right we are mostly actually concerned
about bulk and that's why we made these
like changes to the MVC to deal with
bulk updates specifically so we have
like a different versioning also like
like a different versioning also like
our versions are actually vectors not
individual rows also to deal with this I
I will write a paper about this at some
point I also have a bunch of slides I
can send you if you want to take a look
yes we should um the student that wrote
the MVC stuff with Matt is not here he's
now peaches you to MIT but we swap notes
at some point yeah yeah for sure that
sounds excellent yeah I spent quite some
time on the MVC first on implementing it
in the first place and then I'm
revamping it so definitely it sounds
revamping it so definitely it sounds
excellent
excellent
sorry alright no notice okay this slide
actually copied here again alright so
well some like oh no actually I didn't
accidentally copied AHA so let's go back
to the database problems that we had
before and how does dr. B kind of
address these problems so first we had
difficult install difficult setup slow
then a transfer and a pouring face all
right so first installation and usage
here we actually learn a lot of lessons
from sequel I'd first off it's an
embedded system it has 0 external
dependencies so all the dependencies are
completely in lines we have an
amalgamation build of the system which
means that we can compile our O's all
our source files that you need to build
the whole system into two files a header
and a source file it takes some time to
compile they're like a few minutes but
it does work we have also integrated
it does work we have also integrated
duck to be entirely within the package
manager so you can just type install
duck TB and we'll fetch the latest
version is that it actually compile it
when you go pick install but what does
that do
it's at well depends on the system I
think on Linux it compiles on OS X and
Windows it fetches a binary so it just
gets like a DLL and dumps it there and
also from the Python background yeah but
you can actually get wheels so we Oh sis
like yeah broad distributive binaries so
like yeah broad distributive binaries so
we do that we often to speed up our talk
of your time yeah we use wheels as well
so I actually wanna set this up so the
the wheel parts
I don't have that much insight since he
said that but we do use the wheels yes
so like how mean for the ones like
Windows should be pretty that
environment we keep pretty consistent
I guess OS sense as well but I guess
like what happens that they have some
funky wood you know they're using the
North Korean version of Linux and they
try to install this and if it doesn't
compile do you get an error and you get
a message or it is fairly and you don't
usually the way we set up Python this
type of c++ dependency code is that it
will fall back on compiling so if you
can't find a wheel so
look the system compiler to compile from
source remember the question is I so if
source remember the question is I so if
it falls back to compilation and you're
running on the weird version of flavor
of Linux and compilation fails to do
just duck TB get notified that it then
failed or this or nothing
I'll leave the open an issue we don't
have any sort of reporting back or any
of that like we don't we don't believe
in that either so there's like we don't
think we're putting back not like
invasively let's say of course let's
keep going with a kernel fine all right
sure all right so one more thing well
simple installation
Doug DB does not rely on any external
states which is separate config files or
environment variables or like locale
things which can be pretty messy
otherwise and we have the single file
format which users typically find easier
to handle see for example Microsoft
Office then we are also having that we
also have a composed Wayne's face which
is still kind of a work in progress and
this interface directly connects the you
to the duck TB system without having to
type sequel queries you can still type
sequel queries we support both and ideas
that you can compose these sort of
operations so if you want to have say a
table scan followed by filter followed
by and aggregates you can have this sort
of like DB table then with the name of
the table you wants that gives you a
table scan then you build a filter on
top of that with like a expression and
then you can inspect any sort of
intermediate there then you can run any
errors on that again like all of this
kind of return is like relation in
relation out and it's kind of a very
hidden mapper over the logical operator
tree we have internal one nice part also
about being embedded is that we can
integrate very easily with things from
the source environment so we have a scan
the source environment so we have a scan
of the pomace data frame and we can
produce also a bonus data frame results
and the scan of the Panos data frame is
basically zero copy
except for strings which
nasty and that will just run a scan of
reponse dataframe go through the entire
duck to the query engine and then on the
other end it can also be just upon his
data frame again and then you will
benefit from the vectorization and from
all of our entire engine basically can
just run whatever you want on upon this
data frame we can also run just seek one
data frame we can also run just seek one
upon a state frame no problem um which
is quite nice yes sorry interruption
just wonder how can you do a quick scan
or data frame it's like your storage
form is compatible with data frame
format or you do some transformation
it's like there is four so it's not
about the storage formats
it's about our scan operator that emits
like vectors basically right and these
vectors can point to an array of data
and for in stores and doubles and such
it's just an array of integers or
doubles or what-have-you and palmas just
has the same airy internally right so
because we also run inside the process
of the user we can just get that area
and run on that in our system for
strains it's a bit more complicated
because for Strings they have Python
string objects a bit more of a mess but
for numbers you can do it without any
copying just set some pointers and go I
think lens question is that duck DBE
knows about the data frame for yeah yeah
yes yeah we have a function which is
essentially a table producing function
that has like a pointer to a data frame
and then just scans that in memory
according to mark it sounds like that
they'd have frame format if just array
except for strings internally you
promise data frame is just a bunch of
numpy arrays right yeah sure and the non
PI rays are just erase just see your
race right and we also use
race right and we also use
series right yeah so kind of this
conclusion of this part Doug Lee be
pretty future completes not entirely
we still need parallelism which I'm
working on and compressed storage which
I will work on after that we run the
full TVC engine almost the full T pcs I
have not bought it too important
for outer join yet which makes us fail
to queries but I get to that as well
why I've asked it we have window
why I've asked it we have window
functions complex aggregates we have
arbitrary sub queries that's also very
fun by the way arbitrary sub queries I
can give another talk about that we have
compressed execution and I can't read
the other one string collation yeah yeah
important that this more that was from
so did you did you use like I see you
actually have utf-8 proc it's called and
it's a very lightweight alternates ICU
that's created by the developers of
julia if you know this and it doesn't
actually support all the things that I
see you does but it comes quite close it
supports like string composition
decomposition its force like a graphene
cluster detection these sort of things
an anorak it's not as impressive yeah so
thanks this is what we use because we
really don't want to depend on ICU so we
don't actually support full string
collation either like the full like all
locales shits not exactly but hey we
probably will add that as like extension
at some points because well you don't
want to link I see you just for that I
think I think that's a big waste it's a
big thing yeah so dr. be free open
source very pre-release we have a
website github feel free to try it or
take parts of it if you're by chance
making a database system and if you find
making a database system and if you find
anything that breaks please send us a
bug report and then that was the end of
my cider presentation but I think I will
continue a bit because I still have some
lessons learned that I think might be
interesting for people that might be
coincidentally also developing a
database edge potentially allegedly all
right so just some quick notes testing
what we have learned is that for making
a database system it's probably the best
to always just test sequel in in a way
database developers we are kind of
blessed because we know the requirements
of our system right
it's the run simple curriculum and the
internals like the internals of the
system they're only useful if they work
towards that goal of running sequel so
testing internals it's it's kind of easy
to do but it's a bit of a trap we have
found it makes changing them later on
very difficult and you will kind of need
to change them if you want to refactor
them at some point so some kind of
history maybe we refactored irie
factoids slash we wrote the vector
implementation twice once to add support
for arbitrary vector buffers wants to
add support for compressed execution I
reworked the binder like three times to
get arbitrary sub queries to work that's
annoying that's yeah if you want
actually to take my or our binder feel
free it's does support arbitrary sub
queries I'm sorry Karla correlated sub
queries so it's it's quite I'm pretty
happy with the implementation we have
now there's still a bit I want to change
but it's it's kind of funny the binding
right it's one of those things nobody
talks about but it's actually quite
complex like it's a lot of things happen
yeah yeah so the planner is well rewrote
a bits and I think when we were
rewriting this we were quite lucky that
we did not have many internal tests and
the ones we did have they actually cost
us some paints like we had a bunch of
tests that tested the vector
representing x' directly and I had to
like fix those while it was refactoring
it was just kind of annoying I ended up
just removing a lot of those so I think
test sequel is kind of the holy grail of
it it actually makes it a lot easier
later on to refactor your system
another points make testing fast if
testing is slow you don't do it as often
it's also kind of developer time you're
it's also kind of developer time you're
waiting for this so we have split
testing until like a fast and a slow
path the fast test just tests
functionality on small data sets and
it's they actually test a lot like you
can see here is the fast test they run
in seven seconds on release mode and
they had trigger like 150,000 assertion
they I think they get like 93 92 percent
code coverage or Saugus but they test
only small data set so it goes very fast
only small data set so it goes very fast
and then we have a slope app that tests
like larger data sets the test
concurrency so where we have like
parallel updaters parallel inserts where
we test like trying to like since it
kill to the database see if it gets
corrupted those sort of things that's
all on the slow path those who run as
well but when you are just developing
something or refactoring you just run
the fast test and then for us that's
very convenient
what what what unit test framework are
you enjoying that
what is it can I come grown or is it -
what is it okay yeah yeah yeah we really
like catch because it's again no
external dependency it's just one hitter
one annoying thing though is that
because it's one header it's the
slowdown compilation somewhat which
actually we use unity builds for now so
any sort of files that are like together
in the same folder we all compiled
together it's like one batch because
otherwise the compilation takes like
twice longer and it's not super long but
it's kind of annoying if you're running
on a Mac book of like five years ago you
know like it's gonna take like at least
like three four minutes for a full
compile so it kind of blocks you take
out of the flow um all right so
verification code I think that's also
very important in the fast test we
actually run every select statement four
times once the original query obviously
then we run the query with all
optimizers turned off then we run the
copied query so we have like a copy for
expressions for select nodes like
everything you can copy so we copy the
curry and run again to verify that the
copy works and then we also serialize
and deserialize and run again to verify
that the deserialize in serialize
minutes work and i think this is very
nice actually because this copy and
serialize these here lies it's very hard
to test for but this basically makes it
just work because if it doesn't work if
you like to forget to one elements
to copy it then your query is not going
to return the same result that you just
catch when you say I'm optimized it's
you mean like I'm not my C code or I'm
optimized like in the planets my sequel
so we turn off the pool optimizer we run
whatever comes out of the planner is
straights just got it okay so we
actually this is why we only do this
obviously for the fastest because this
also means we don't run join optimize
also means we don't run join optimize
optimization which means we just run a
bunch of cross products usually which is
actually great to catch bugs in your
joint implementation because now any
sort of join will be run against a cross
protocol filter and if they don't match
up probably have a problem your joint so
interesting okay this has caught tons of
bugs for us like it's really really
really good
for it has been very good for us so next
next advice I can give is make an
next advice I can give is make an
embedded database even if you're not
making an embedded database make one
anyway for testing in the ends the the
kind of the wrappers around a sort of
database server it's kind of a wrap
around embedded database anyway and an
embedded database just makes it's much
more convenient to write tests and much
faster because you don't need to deal
with this whole client service stuff and
if you are sending like individual
sequel statements that are running on
like one row two rows all this like
overhead actually adds up and it makes
your tests run a lot slower which again
is kind of annoying if you have to run
the mullet so we have gotten a lot of
like a good stuff from having an
embedded database so before this I
worked on a DB which does not have that
or did not have that until we actually
added this and they are testing
framework it takes around half a seconds
I think for every single test they run
I think for every single test they run
because they start up a database server
startup client connect to the database
server around the query could check the
results and as a result they they just
run a lot fewer tests like they like
they're fast tests take like 30 minutes
to run or steak-like
few seconds and we actually run more
tests in the time so it's it's like it
really makes a difference in how much
you can test as well right because
there's always a limit in in terms like
well then automatic benchmarking is very
useful I know Andy is has an advanced or
wants to add it work in progress we're
in the project okay so it detects
unexpected regressions which is very
useful like here's something that
actually found this morning because one
thing that we don't have yet which we
also want is automatic notification when
it actually breaks so here was a test it
has been running for a year that broke
down in December and we only now find
out because nobody was actually looking
at this test because it's not it's just
somewhere in our micro tests but
actually found kind of a bug that is
here you can actually see by the commits
we have like a list of commits if you go
on our website you can see it and you
can see hey this commits actually broken
so you can like you know it's here and
what actually happened here is that this
between expression broke the index
optimizer which made this thing not use
an index scan anymore but a regular scan
so great good catch we probably not have
so great good catch we probably not have
catch you don't catch these things in
unit tests as much right because the
answer is still correct
all right kind of the last point I think
be careful with dependencies it's
something that we are kind of we hold so
hearts partly because we are forced to
by our work environment to hold this to
heart dependencies they are very easy
for the developer but they can be very
difficult for the user for example
they're running a different operating
system they might not have a package
system they might not have a package
manager or they might not have routes
and this is actually our scenario on our
work machines we don't have roots and
which means we can use the package
manager which really sucks when you want
to install things really sucks if they
have a lot of dependencies otherwise
it's fine
and of course packages can break we all
know about the left path fiasco I think
you have a dependency chain just
everything can break down if you have
like one board package in there so the
Pens there are live bill
so you should use them with caution
don't include a dependency on boost just
to par as one number from a string
please I actually found we actually
found this in the D Python parkade
reader like deep - Parque reader that
did this it had it has like 20
dependencies but this one was the most
dependencies but this one was the most
ridiculous
ridiculous
it was parsing one number and it pulled
into a hole of boost for that ridiculous
and here's a nice quote from the tiny
and here's a nice quote from the tiny
verse people who are kind of on a
mission to to have less dependencies
every and every dependency you add to
your project is an invitation to break
your project right yeah so yeah like I
said we can use route on work machines
if I have not complained about some up I
actually tried to use terrier there and
I got met with this message so I was not
I was not I was not so happy I couldn't
use it without first compiling G email
so maybe for you also it would help me a
lot if you guys were to not use the pens
okay so hard let's come on things one is
to you guys WG email no we don't have it
right now we probably will add a edits
but if we added we might even inline its
or as like an all tighten right so so we
find it gives us like a 20% boost
yeah that's why we have it um the so you
say you have no dependencies right to
say you have no dependencies right to
the you're not a lot but not we have
inline dependencies so we use the
Postgres parser we yeah like you've
copied that code you need into it a
third-party directory and any co-pilot
so how do you decide like is it just ad
hoc I mean it's just you and honest like
how do you decide like I roll we need
this feature we can roll our own or we
this feature we can roll our own or we
can bring in this dependency if you have
like you have a way to chucking that
again give you an environment it's
mostly what we find would be a
reasonable use of our time and how our
criteria is also a bit like
how easy is it to inline in the first
place so we are making these
amalgamation builds which means dad's
well first off everything has to run
under C++ so we actually have spent some
time converting see dependencies like
the Postgres parts to C++ like the
the Postgres parts to C++ like the
actual grammar and the Yaak stuff all of
this has to be simple to us and the
problem is if he wants include one
library you have to include all of its
dependencies well so very much depends
on what like the cost-benefit trade off
right and we kind of just make that ad
hoc I would say the ones we use are like
the Postgres parser because we don't
write our own parser we use the utf-8
proc because we don't want to write our
own Unicode framework because this is a
lot of effort and annoying we also use I
think we have somewhere
Snuppy in there for example because why
would we reinvent Snuffy you know like
that's where a snap-snappy
snap peas I was clapping sorry actually
a German song sneeze national people so
that's why pronounceable okay yes so
then we'll also things I just named to
talk so I we know into applause so you
talk so I we know into applause so you
hit the applause button I don't do it
it's an empty gesture but you want to do
that to thank mark so why do we go
around and if anybody wants to ask
questions you can and then I guess you
raise your hand we gonna meet you or
okay questions that um the I was gonna
say um Oh for the you didn't talk about
how you you're built on top a sequel
light or you like you implement the
sequel light API today like like so so
when you say it's in a better database
like your code is what setting sequel
over JDBC - there's no there's no it's
it's really just like sequel lights you
know like you we have a function called
query that takes sequel string and that
feeds directly into our engine there is
no nothing else we also implement indeed
the sequel lights API but that is just
like we have the header we have our own
implementation which is duck to be under
today but we have a dead as well but
there's no client we yeah we don't
really have any protocol of any kinds we
really have any protocol of any kinds we
have some rest server that we use for
one thing but it's not really seriously
you know it's our focus okay and then so
and then for the Fast and Slow test so
like command mine I call make you to
test whatever thank yous you the fast
off and then at night or when you spend
to github you run the slow stuff I how
does it work it's very much depends on
what you want to do so if you want like
a basic verification if your system is
not broken you just run the fast one if
you want to debug a test you obviously
run just that specific one you don't run
everything before you push the get up
you always run all the like the slow
ones they take like maybe 15 20 30
minutes on this depending on your
machine so you can just your ability to
fire off Travis the bill woody what are
you giving the small stuff yes well
actually we have a pretty complex Travis
set up we run a bunch of different
compiler systems of course the runner
our packages piping packages on both
Linux OSX windows we run like GC 5 I
think it's the lowest one up to
sanaya we were on clang different
versions we also have some extra
verification stuff we run for example we
have this vectorized engine we have one
build where we run with a vector size of
two so our normal vector size of
thousands but running with effect size
of two gives us actually detects some
other bug bugs then we also have
different compilation options so we
compile the amalgamation we do like our
normal compiled we test all of those in
traps like it's I think we have like 17
18 Travis tests we also actually run a
because we'll need Solaris to work
because there are a bunch of old people
that like sell ours for some reason so
that like sell ours for some reason so
ok yeah alright any other questions from
anybody else um question
so you said you implemented the rewrote
to find a copy and wait you should
introduce yourself hi I am someone who
is currently maybe going to rewrite a
binder in the near future
oh very good he's an honor gratis see of
you he's starting his PhD with me in the
fall so you will come across mark and
when mark will come across land in the
future we're sure yeah
as if we still have conferences then we
will say yeah so I was kind of wondering
what you mean takeaways were for how you
should go about writing a binder so that
you don't have to rewrite it again and
also whether you're kind of annotating
existing objects or if you're kind of
figuring out what information needs to
be generated answering that on the side
so it's a bit it's a bit tricky I think
so it's a bit it's a bit tricky I think
one thing that is kind of crucial for us
was this like software binding which is
quite tricky so we started with I think
something very similar to your binder
which gave us a lot of problems with
binding correlated subqueries
so we added kind of like this nesting of
binders where you have like different
nesting phase of the binders
but also on the expression level so you
have an expression binder that has like
a different layering as well so because
for example you can have a sub-query in
your select clause that can then bind to
the variables that are available in
select clause right but then within that
you can have another sub query that has
then a where clause that has then
another software in the where clause and
then that one needs to bind in the where
clause again and you combine in all
these different levels so now on the
combo I would require you need to bind
the Select Clause the inner query you
need to bind the where clause so you
have like this layering of these kind of
different expression binders depending
on where you are rights because if you
are in say a select clause you can bind
aggregates but if you're in a where
clause you can't so you have this kind
of context there I mean I think that's
kind of the way we structured our binder
now one other thing that's important is
that you have to at least for us we want
to do statistics propagation as well
throughout the binder because we do
overflow prevention using those
statistics so you have to go from like
you have to be able to kind of track
those statistics throughout the plant
those statistics throughout the plant
right because you don't want to just
give up your statistics when you say
leave the Select you want to go from
okay I have my statistics from the table
I go through my filter through my where
I go to my select and then you propagate
all those statistics your binders doing
that yeah actually knots now anymore we
did it but this is we need to do this
stuff okay we used to have this but
we're gonna riad its we I removed it in
one of the reworks okay but it needs to
the reason we do or want to or use to do
that and want to do that still is that
we need to have this for old flow
prevention because we want to guarantee
like statically can this expression
overflow and if it can't then we can use
a fast path on the computation right so
so by statistics you mean we two
together include like the histograms a
carnality is of the column budgets min
max is no has Unicode you know like
max is no has Unicode you know like
because rock for string operations
there's usually fast that it was just
asking so this
things you know how about the other
things like his gran canaria TS mission
the carnelli estimation is done
somewhere else but we don't have a very
good one now actually it's one of the
features that's kind of missing that's
done in the joiner optimize that's not
done in the binder kind of a quick
question actually going back to what you
said earlier when you say it's based off
our kind of similar or like look similar
our kind of similar or like look similar
to what we do do you mean a little note
that you mean that area no I mean
peloton but I guess you've got some code
similar to that no I know should we
report the binder but we wrote it twice
so third time's the charm
exactly yes do you use the binder to
kind of do validation of my I think I
just search don't DB also supports like
prepared statements like how does the
prepared statement code path differ from
the binder code path if it comes to for
example validating types so the prepared
statements actually one thing how does
implement is all but I do know something
about the prepared statements they need
to have a type right and then the binder
we do all the type resolution so also
this is one thing that's we had to add
as well as like for a function interface
that we have where you have like this
that we have where you have like this
propagation of all the sequal times I
don't know exactly how it works for you
guys we have a separate type internally
and in sequel and the reason for that is
that we have actually the same internal
type for different sequel types so for
example it needs is just a 32-bit
integer which means we have a sequel
type date and an internal type integer
and all of this sequel type information
it only exists during the binding phase
and after that it's all converted to the
internal types and the binary does all
this type resolution and I kind of
forgot your original question whether
the type resolution also has to happen
for preparedstatement
yeah yeah so the same preparedstatement
what happens is we have a prepared
statement in
the binder that has an unknown type and
basically when you have like an unknown
type and it gets kind of it goes down
and when it encounters something that
and when it encounters something that
gives it a type we assign that type to
the preparer saving and that can be for
example a function so if you call like a
prepared statement inside a substring
and it's the first argument the
substring that has to be of our chart
right there's no other possibility so
then we add a cast to of our chart and
then and then the prepared saving
becomes a varchar' statement and then
you can only assign a var chart type to
that statement if you will right yep
that makes sense but so when there WB's
MIT license you just look at the binder
mhm yeah yeah if you have any questions
shoot me an email or we can have like a
zoom chat I'm sitting at home all day
challenging I don't might answer
questions about you okay thank you just
a very high-level question I have out of
a curiosity right all of all the
components of duck TB elect and the
storage execution a transaction or maybe
you are yeah there's like the
concurrency control code or the binding
and catalog which part you find if you
can give a rough estimation that is
thicker you spend most of time the most
difficult to deal with and why this is
difficult to say because we have not
given each component the same focus so
for me personally the most difficult
component was the storage and we have
component was the storage and we have
not actually the storage is the least
polished part of the system I think okay
and the reason was so difficult is
because our group has not actually done
that much in storage so in nettabi which
is the one of the other database we have
maids they just a map some areas and
that's it they call it today I think
vector wise they have a very complex
storage that was written by basically
one guy that has left 15 years ago and
I've not personally worked with the
storage there but from what I heard it's
kind of a mess but basically we are like
as a group
focusing a lot on analytical queries
query execution these sort of things so
for me these came very naturally like
quarry execution was quite easy to do
which is also why I think this is
probably the best component is the
execution framework like for the storage
I I had to like relearn how to write a
buffer pool and such
I see thanks for sharing
um this is paulina I'm a master's
student in the computer science program
I was wondering if you talked a bit
about you said that you have implemented
compressed execution and then you're
also working on doing compressed storage
and I was wondering if you could tell
about that yeah so we actually the way
we have implemented compressed execution
is kind of part thing our vectors that
propagate through the systems they have
difference they have a type as in they
have an integer type or a string type
but they also have a vector file which
is the actual physical representation
and the physical representation can be
anything it can be a dictionary
compressed it can be like a flat array
it can be a single constant as well so a
single cause it's like logically
replicated but physically wants and
these vectors they kind of flow through
the system in a compressed in what's
whatever type you want and then whenever
you get to an operator the operator can
decide what to do with it so we have a
methods that we call horrify because it
was came up with by Ori airlin from
Facebook he came out this and it
basically creates like a thin wrapper
over a vector it gives you like an
iterator over it that is quite efficient
for common cases and then we have some
specialized code to handle like
different types of compression but it's
different types of compression but it's
very like modular so for any function
where you want to implement compressed
execution or any operator you can just
kind of add a specific type of
compressed execution to that so we don't
have everything yet like we don't have
I'm here everything we probably don't
even want everything but it's like the
so in the case of dictionary compression
so in the case of dictionary compression
are you passing the dictionary along
with the vector or are you yeah so the
vector the vectors in our system they
have shared pointers to a buffer
basically and the buffer can hold
anything it's not just a it's kind of a
Miss mean it's not just a bunch of my it
can also hold a dictionary it can vote
like a lock as well so we can have like
a lock on something there or we can have
a pin buffer pool page and it's all like
a pin buffer pool page and it's all like
a shared vector it share pointer and
vectors and that basically means that we
can have like a dictionary that's
referenced by a vector and as long as
that vector is alive the dictionary will
also always also be alive it's going to
actually have to physically go along
with the vector just be a pointer but it
can be kept alive as long as the vectors
like okay and when you say you're
working on compress storage what are you
thinking about well there's a bunch of
different things I think from our group
different things I think from our group
we have the P for Delta which will go in
there probably for sure actual Peter
came up with a new string compression
method which also will go in there for
method which also will go in there for
sure
sure
besides that probably some simple things
like arleigh or dictionary and such for
now all our all our storage is just flat
but we are adding like at least like the
the simple things you know like
dictionary or Li the P for Delta maybe
not that simple string compression so
we're not gonna use generic compression
we're not gonna use generic compression
like we're not gonna use Snuppy or
anything for example we're just snappy
snappy sorry it snappy thank you very
snappy sorry it snappy thank you very
good
good
usually I'm the person correcting
people's pronunciation at our offices so
yeah what uh what are your catalogs look
like is it just are they table those
amazing seven data structures they're
our catalog is just a bunch of unordered
our catalog is just a bunch of unordered
Maps
Maps
Maps
but they are transaction managed so we
actually have a separate code path for
our catalog but it is fully managed by
the MCC because we really do want to
have things like create table statements
being tied to transactions of course but
the main reason we went with this like
sort of different representation is
because we had a lot of problems in
Minetta B with this because in Manatee
they use the main storage engine to also
handle the catalog but the storage
engine in Minetta B is very optimist
which means if you so much look if you
someone just look at the same table from
two different threads the whole system
just blows up like it's it starts like
undoing things like saying like all
transaction aborted there was a
potential conflict you know and that got
really problematic when you wanted to
handle the catalog because if two
threads would start transaction create a
table they would already have a conflict
so we kind of took away from that that
if you're doing optimistic concurrency
control maybe you don't want to do that
from the cowl so mmm okay we are in are
in the catalog and they are
transactional or Sarah catalogs are in
tables than they are transactional so it
sounds like it was sort of an
engineering limitation of money to be
rather than some more fundamental no
they could be stored as tables as well
it's just that's Rennie what I'm saying
but whether the issue of like we
described in a dvd-roms it sounds like
it was an engineering problem not a
fundable it's I think either way it's
kind of fine
there is some duplicate code that we
have in the catalog that deals with the
transaction stuff yeah because the
handles is kind of in a different way I
think if you're making a system that's
pretty good with like points updates
like that it's not super optimistic its
like that it's not super optimistic its
finest stores tables as well I think
there's no fundamental objection to that
but I think for like a optimistic engine
like we ran into these problems and for
us it's also kind of easy to just have
like a nun word map
so you know like it's it's kind of
simple in a way as well uh it's easier
to pretty strapped because it's no
chicken for the egg problem um yeah all
right I have two more questions and then
we should let you go because it's it's
um so you make sure that you're using
shared pointers have you found that to
ever be either a performance issue or
just use use unique pointers almost
everywhere okay the only the only case
where we use SharePoint errs is actually
an inspector buffer and I I thought
about it a lot and we ended up doing it
because we I saw no other good paths to
implement this and the reason for that
is that the thing is if you have a
vector that has like a dictionary for
example we have like operations that
just reference a vector and you have to
have somehow some sort of shared
ownership there or you transfer the
ownership but sometimes it's a bit iffy
because if you have for example a case
statements a case statement gets two
vectors as inputs or more but it could
be two if you then have strings you
don't want to copy all the strings just
want to it kept like pointers to them
right so because of that you can't you
have like the case table that has relies
on this sort of string area and you have
the original vector that realized in the
string area so it is like inherently a
shared thing because you can have
multiple vectors depending on the same
sort of data but for the rest of the
system we all use unique pointers and I
did I really think that was the good
choice as well like unique pointers
everyone we uh in in Palestine we just
it was it was a wreck we are doing
something different in our new system we
don't use shared pointers we actually
mirror and pointer type and we do we
manage memory separately my last
question is it's an embedded system so
it's single threaded right or like how
does like right so right now it's single
does like right so right now it's single
threaded
Python right like sorry because you're
running in Python you just can't like
you've never thread to dim it no no you
can you can launch any threats you want
course there's nothing stopping you from
launching threads from the C code and
piping like only if you want to
interface with Python things do you have
to actually respect this but you can
easily run the database engine which we
also will do so dr. B it is currently in
single threaded per query but we have
full support for concurrent queries and
such so you can run in updaters and
inserters and readers however many wants
and most of it is not going to interfere
with each other unless you do like
with each other unless you do like
single updates that point heavy
workloads then you have a lot of
experience but that's also not what we
care about what about my background
threads again so like if you if if it's
an embedded system and you told you
you'd only have one thread or like I
guess they do where they want right like
yeah so we it's basically we are yeah
people can do whatever they want with
the system that's kind of it's it's it's
a chat it's somewhat of a challenge
because people can mess with it of
course well yeah we won once we add the
multi-threading we will allow the user
to limit it as well because they might
want to you don't want to run for eight
cores on your laptop for no reason right
maybe a database it's not that important
always you know so anything you have
checkpoints like you have a background
friend take a checkpoint no no yeah so
this is one thing that we kind of decide
on as well is we don't want any
background things okay so the
checkpoints actually are pretty ugly but
they will just trigger once the database
reaches a certain size like there and
they will just be part of the whatever
made the database exceed that threshold
so we made the cards that you should not
have any backgrounds anything like back
on GC no no no GC okay everything is run
just inside the the query like if you
call dr. B dot query or whatever
everything in ten
that's all dr. people do like there's no
background anything like it's kind of
that kind of a call right philosophy a
little bit more just curious can you
elaborate the D Syrian are not having
any background thread a little bit more
yeah so the thing is that because we're
running so there's actually I have some
running so there's actually I have some
extra slides on this okay I'm never
gonna find this yeah resource sharing
okay I guess it's this slide okay you
guys still see the slides oh yes Oh mmm
unskilled okay researcher yeah so
resource sharing it's kind of a big
thing for us and not for standalone
servers because if you run a salient
database server you have no it's yours
basically like you take up the whole
basically like you take up the whole
computer but for us we don't just run on
someone's laptop probably we run in
another process so we really want to
kind of respect this boundary somehow as
well and not just go like you don't want
your Python process that you are typing
interactive queries in so just randomly
hang for a few seconds because your
database decides needs to check points
you know or like your database decides
aha you haven't done anything inside the
aha you haven't done anything inside the
database
database
I'm gonna check points which makes a lot
of sense for standalone server but not
so much other ones and we have been
looking a bit into adaptively doing
resource sharing as well but it's
something that's pretty tricky like it's
not probably not something we're gonna
really look into but we have considered
this okay
[Music]